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1 CLAIMS 

2 WHAT IS CLAIMED IS: 

3 L A system for creating an aggregated data model from a plurality data distribution models, 

4 each data distribution model describing a data distribution having one or more data 

5 elements, each data element having a value, each data distribution model having one or 

6 more bins, each bin comprising a start point having a value, an end point having a value, 

7 a value indicating the number of data elements for each bin, and a polynomial formula 

8 associated with each bin, the polynomial formula approximating the data elements for the 

9 respective bin, comprising: 

10 a processor; and 

1 1 a computer program executable on a processor, the computer program adapted to 

12 perform the following steps: 

13 (a) determining which start point has the minimum value and which end point 

14 has the maximum value of all of the bins of all of the data distribution 

15 models; 

16 (b) setting a start point of a first bin of the aggregated data model to said start 

17 point determined to have the minimum value; 

18 (c) setting an end point of a last bin of the aggregated data model to said end 

19 point determined to have the maximum value; 

20 (d) determining a total number of a plurality of points for the aggregated data 

21 model by adding the values indicating the number of data elements from 

22 all bins from all data distribution models; 

23 (e) approximating the data elements in the data distribution described by each 

24 data distribution model using the start point, polynomial formula, and 

25 number of data elements for each bin in each respective data distribution 

16 
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26 




model, each approximated data element comprising one of said points in 


27 




the aggregated data model; 


28 


(f) 


sorting the points from minimum to maximum; 


29 


(g) 


distributing the points into one or more bins in the aggregated data model 


30 




such that a substantially equal number of points are in each bin of the 


31 




aggregated data model; and 


32 


(h) 


determining a polynomial formula with the sorted data elements for each 


33 




bin of the aggregated data model. 


1 Z. 


The system of claim 1, wherein the computer program is further for determining the end 


2 


point for each bin in the aggregated data model. 


1 3. 


The system of claim 1, wherein the computer program is adapted to perform the step of 


2 


distributing the points into the one or more bins of the aggregated data model according 


3 


to the following formula; : 


4 


(a) 


if the number of points in the aggregated data model is equally divisible 


5 




into the number of bins, the end point of the first bin is equal to the value 


6 




of the ith point in the aggregated data model, wherein i is the number of 


7 




points in each bin determined by dividing the points equally into the 


8 




number of bins, wherein the value of the end point of each bin is equal ith 


9 




point after the last point in the proceeding bin, wherein the start point of 


10 




each bin is equal to the point after the last point of the previous bin, else 


11 


(b) 


if the number of data elements in the points is not equally divisible by the 


12 




number of bins, then the number of points in each bin is determined by 


13 




dividing the number of points by the number of bins, and then adding one 


14 




to the count of the points in each of a number of bins equal to the 



17 
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15 remainder after dividing, wherein the bins that have one added to the count 

16 is determined according to the following formula: 

17 fork from 1 tor 

18 binadd = INT((n*k)/(r+l)) 

19 next k 

20 wherein bin^^j^j is the sequential bin number to add one to the count of 

21 points to include therein, n is the total number of bins in the aggregated 

22 data model, r is the remainder from dividing the number of points in the 

23 data distribution by the number of bins, and INT is a function for rounding 

24 the result of the bracketed formula to produce an integer result. 

1 4. The system of claim 1 , wherein the computer program is for performing separately for 

2 each bin of the aggregated data model, the steps of approximating the data elements for 

3 each bin, determining the end point for each bin, and determining the polynomial formula 

4 for each bin. 

1 5. The system of claim 1, wherein each data distribution model is the result of the computer 

2 program performing a the following steps: 

3 (a) sorting the data elements in from minimum to maximum for each data 

4 distribution; 

5 (b) computing the number of data elements in each data distribution; 

6 (c) determining the value of the start point and the value of the end point of 

7 each bin by dividing the data elements into a plurality of substantially 

8 equal sized bins for each data distribution; 

9 (d) counting the number of data elements in each bin for each data 
10 distribution; and 
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11 


(e) 


computing each distribution model for each data distribution, each 




12 




distribution model comprising, for each bin, the start point of the bin, the 




13 




end point of the bin, and the number of data elements in the bin. 




1 6. 


The system of claim 5, wherein the computer program is adapted to perform the 




2 


following steps for determining the start points and end points of the bins for each data 




3 


distribution model: 




4 


selecting as the start point of the first bin the value of the data element having the 


q 


5 


minimum value in the sorted data distribution; 




6 


determining the start point and end point of each bin according to the following 


w 


7 


criteria: 




i :i 3 


8 


(c) 


if the number of data elements in the data distribution is equally divisible 


!.r! 


9 




into the number of bins, the end point of the first bin is equal to the value 


n 


10 




of the ith data element in the data distribution, wherein i is the number of 




11 




data elements in each bin determined by dividing the data elements 




12 




equally into the number of bins, wherein the value of the end point of each 




13 




bin is equal ith data element after the last data element in the proceeding 




14 




bin, wherein the start point of each bin is equal to the data element after 




15 




the last data element of the previous bin, else 




16 


(d) 


if the number of data elements in the data distribution is not equally 




17 




divisible by the number of bins, then the number of data elements in each 




18 




bin is determined by dividing the number of data elements by the number 




19 




of bins, and then adding one to the count of the data elements in each of a 




20 




number of bins equal to the remainder after dividing, wherein the bins that 




21 




have one added to the count is determined according to the following 




22 




formula: 
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23 for k from 1 to r 

24 binadd = INT((n*k)/(r+l)) 

25 next k 

26 wherein binadd is the sequential bin number to add one to the count of 

27 data elements to include therein, n is the total number of bins in the data 

28 distribution model, r is the remainder from dividing the number of data 

29 elements in the data distribution by the number of bins, and INT is a 
rn 30 function for rounding the result of the bracketed formula to produce an 
i'^ 31 integer result. 

1 7. The system of claim 6, wherein the computer program is further for performing the step 

2 of counting by counting, for each bin, each data element satisfying the following formula: 
■n 3 Start point < element value <= end point 

ijS 4 wherein the bin start point is the start point of the respective bin, element value is the 

, 5 value of each data element in each bin, and end point is the end point of the respective 

|™ 6 bin. 

1 8. The system of claim 7, comprising a storage medium for storing each data distribution 

2 model by storing, for each bin, the start point, the end point, the number of data elements, 

3 and the parameters of the polynomial formula. 

1 9. The system of claim 1 , wherein the computer program is further for performing one or 

2 more statistical analysis using the aggregated data model. 

1 10. The system of claim 9, wherein the statistical analysis performed comprises determining 

2 the range of the points of the aggregated data model analyzed by subtracting end point of 
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3 the last bin in the aggregated data model from the start point of the first bin in the 

4 aggregated data model. 

1 11. The system of claim 9, wherein the statistical analysis performed comprises determining 

2 the inter quantile range of the points of the aggregated data model. 

1 12. The system of claim 9, wherein the statistical analysis performed comprises determining 

2 the median value of the aggregated data model by determining a number j computed by 
cj 3 dividing the number of bins by 2, and then reading the value of the end point of the jth 
I'S 4 bin as the median value if the number of bins in the aggregated data model is equally 
'=0 5 divisible by 2 or by reading the interpolated value using the polynomial function of the 

6 mid point of the jth bin if the number of bins in the aggregated data model is not equally 

LJ 

LO 7 divisible by 2. 
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1 13. A method for creating an aggregated data model from a plurality data distribution models, 

2 each data distribution model describing a data distribution having one or more data 

3 elements, each data element having a value, each data distribution model having one or 

4 more bins, each bin comprising a start point having a value, an end point having a value, 

5 a value indicating the number of data elements for each bin, and a polynomial formula 

6 associated with each bin, the polynomial formula approximating the data elements for the 

7 respective bin, the method comprising: 

8 determining w^hich start point has the minimum value and which end point has the 
.0 9 maximum value of all of the bins of all of the data distribution models; 

10 setting a start point of a first bin of the aggregated data model to said start point 

'Z^ 1 1 determined to have the minimum value; 

12 setting an end point of a last bin of the aggregated data model to said end point 

;^ 13 determined to have the maximum value; 

Cn 14 determining a total number of a plurality of points for the aggregated data model 

]f\ 15 by adding the values indicating the number of data elements from all bins from all data 

N 16 distribution models; 

17 approximating the data elements in the data distribution described by each data 

18 distribution model using the start point, polynomial formula, and number of data 

19 elements for each bin in each respective data distribution model, each approximated data 

20 element comprising one of said points in the aggregated data model; 

21 sorting the points from minimum to maximum; 

22 distributing the points into one or more bins in the aggregated data model such 

23 that a substantially equal number of points are in each bin of the aggregated data model; 

24 and 
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25 determining a polynomial formula with the sorted data elements for each bin of 

26 the aggregated data model. 

1 14. The method of claim 13, comprising determining the end point for each bin in the 

2 aggregated data model. 

1 15. The method of claim 13, wherein the step of distributing the points into the one or more 

2 bins of the aggregated data model is performed according to the following formula:: 

3 (e) if the number of points in the aggregated data model is equally divisible 

4 into the number of bins, the end point of the first bin is equal to the value 

5 of the ith point in the aggregated data model, wherein i is the number of 

6 points in each bin determined by dividing the points equally into the 

7 number of bins, wherein the value of the end point of each bin is equal ith 

8 point after the last point in the proceeding bin, wherein the start point of 

9 each bin is equal to the point after the last point of the previous bin, else 

10 (f) if the number of data elements in the points is not equally divisible by the 

1 1 number of bins, then the number of points in each bin is determined by 

12 dividing the number of points by the number of bins, and then adding one 

13 to the count of the points in each of a number of bins equal to the 

14 remainder after dividing, wherein the bins that have one added to the count 

15 is determined according to the following formula: 

16 for k fi-om 1 to r 

17 binadd = INT(n*k)/(r+l)) 

18 nextk 

19 wherein bin^dd is the sequential bin number to add one to the count of 

20 points to include therein, n is the total number of bins in the aggregated 
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21 




data model, and r is the remainder from dividing the number of points in 




22 




the data distribution by the number of bins, and INT is a function for 




23 




rounding the result of the bracketed formula to produce an integer result. 




1 16. 


The method of claim 13, wherein the steps of approximating the data elements for each 




2 


bin, determining the end point for each bin, and determining the polynomial formula for 




3 


each bin are performed separately for each bin of the aggregated data model. 


! — £ 


1 17. 


The method of claim 13, comprising creating each data distribution model using the 


'5 


2 


following steps: 




3 


(f) 


sorting the data elements in from minimum to maximum for each data 


: H : 


4 




distribution; 




5 


(g) 


computing the number of data elements in each data distribution; 




6 


(h) 


determining the value of the start point and the value of the end point of 




7 




each bin by dividing the data elements into a plurality of substantially 




8 




equal sized bins for each data distribution; 


u 


9 


(i) 


counting the number of data elements in each bin for each data 




10 




distribution; and 




11 


a) 


computing each distribution model for each data distribution, each 




12 




distribution model comprising, for each bin, the start point of the bin, the 




13 




end point of the bin, and the number of data elements in the bin. 



1 1 8. The method of claim 1 7, comprising determining the start points and end points of the 

2 bins for each data distribution model using the following steps: 

3 selecting as the start point of the first bin the value of the data element having the 

4 minimum value in the sorted data distribution; 
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5 


determining the start point and end point of each bin according to the following 


6 


criteria: 




7 


(g) 


if the number of data elements in the data distribution is equally divisible 


8 




into the number of bins, the end point of the first bin is equal to the value 


9 




of the ith data element in the data distribution, wherein i is the number of 


10 




data elements in each bin determined by dividing the data elements 


11 




equally into the number of bins, wherein the value of the end point of each 


12 




bin is equal ith data element after the last data element in the proceeding 






bin, wherein the start point of each bin is equal to the data element after 






the last data element of the previous bin, else 


1 15 


(h) 


if the number of data elements in the data distribution is not equally 


W 16 




divisible by the number of bins, then the number of data elements in each 


i.ii 

:h 17 




bin is determined by dividing the number of data elements by the number 


U 




of bins, and then adding one to the count of the data elements in each of a 


19 




number of bins equal to the remainder after dividing, wherein the bins that 


S 




have one added to the count is determined according to the following 


21 




formula: 


22 




for k from 1 to r 


23 




binadd = INT(n*k)/(r+l)) 


24 




next k 


25 




wherein bin^^d is the sequential bin number to add one to the count of 


26 




data elements to include therein, n is the total number of bins in the data 


27 




distribution model, r is the remainder from dividing the number of data 


28 




elements in the data distribution by the number of bins, and INT is a 



25 
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function for rounding the result of the bracketed formula to produce an 
integer result. 

19. The method of claim 1 8, wherein the step of counting is performed by counting, for each 
bin, each data element satisfying the following formula: 

start point < element value <= end point 
wherein the bin start point is the start point of the respective bin, element value is the 
value of each data element in each bin, and end point is the end point of the respective 
bin. 

20. The method of claim 19, comprising storing each data distribution model by storing, for 
each bin, the start point, the end point, the number of data elements, and the parameters of 
the polynomial formula. 

21. The method of claim 13, comprising performing one or more statistical analysis using the 
aggregated data model. 

22. The method of claim 21, wherein the statistical analysis performed comprises 
determining the range of the points of the aggregated data model analyzed by subtracting 
end point of the last bin in the aggregated data model from the start point of the first bin 
in the aggregated data model. 

23. The method of claim 21, wherein the statistical analysis performed comprises 
determining the inter quantile range of the points of the aggregated data model. 

24. The method of claim 21, wherein the statistical analysis performed comprises 
determining the median value of the aggregated data model by determining a number j 
computed by dividing the number of bins by 2, and then reading the value of the end 
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point of the jth bin as the median value if the number of bins in the aggregated data 
model is equally divisible by 2 or by reading the interpolated value using the polynomial 
function of the mid point of the jth bin if the number of bins in the aggregated data model 
is not equally divisible by 2. 
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